Learning about Machine Learning: An Extended Assignment to Classify Twitter Accounts
نویسندگان
چکیده
We describe a four-week series of assignments in an undergraduate AI course at a liberal arts college developing a supervised learning solution to the problem of classifying Twitter accounts as either a person account or a non-person account (e.g. organization or spambot). This problem employs real data in an ongoing research project by the first author, yet is accessible to students with limited programming expertise. The students were able to experience a complete cycle of creating a machine learning solution: exploring raw data, creating a training set, engineering features, comparing different classifiers, evaluating the results, and performing error analysis. We received positive feedback from the students and intend to refine the assignment and make it available (together with the created training data) for use by the research community.
منابع مشابه
Forecasting Stock Price Movements Based on Opinion Mining and Sentiment Analysis: An Application of Support Vector Machine and Twitter Data
Today, social networks are fast and dynamic communication intermediaries that are a vital business tool. This study aims at examining the views of those involved with Facebook stocks so that we can summarize their views to predict the general behavior of this stock and collectively consider possible Facebook stock price movements, and create a more accurate pattern compared to previous patterns...
متن کاملFinding Sensitive Accounts on Twitter: An Automated Approach Based on Follower Anonymity
We explore the feasibility of automatically finding accounts that publish sensitive content on Twitter, by examining the percentage of anonymous and identifiable followers the accounts have. We first designed a machine learning classifier to automatically determine if a Twitter account is anonymous or identifiable. We then classified an account as potentially sensitive based on the percentages ...
متن کاملExploring Twitter Hashtags
Twitter messages often contain so-called hashtags to denote keywords related to them. Using a dataset of 29 million messages, I explore relations among these hashtags with respect to co-occurrences. Furthermore, I present an attempt to classify hashtags into five intuitive classes, using a machine-learning approach. The overall outcome is an interactive Web application to explore Twitter hashtags.
متن کاملMining Anonymity: Identifying Sensitive Accounts on Twitter
We explore the feasibility of automatically finding accounts that publish sensitive content on Twitter. One natural approach to this problem is to first create a list of sensitive keywords, and then identify Twitter accounts that use these words in their tweets. But such an approach may overlook sensitive accounts that are not covered by the subjective choice of keywords. In this paper, we inst...
متن کاملFame for sale: efficient detection of fake Twitter followers
Fake followers are those Twitter accounts specifically created to inflate the number of followers of a target account. Fake followers are dangerous for the social platform and beyond, since they may alter concepts like popularity and influence in the Twittersphere—hence impacting on economy, politics, and society. In this paper, we contribute along different dimensions. First, we review some of...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011